17 research outputs found

    Comparing Task Simplifications to Learn Closed-Loop Object Picking Using Deep Reinforcement Learning

    Full text link
    Enabling autonomous robots to interact in unstructured environments with dynamic objects requires manipulation capabilities that can deal with clutter, changes, and objects' variability. This paper presents a comparison of different reinforcement learning-based approaches for object picking with a robotic manipulator. We learn closed-loop policies mapping depth camera inputs to motion commands and compare different approaches to keep the problem tractable, including reward shaping, curriculum learning and using a policy pre-trained on a task with a reduced action set to warm-start the full problem. For efficient and more flexible data collection, we train in simulation and transfer the policies to a real robot. We show that using curriculum learning, policies learned with a sparse reward formulation can be trained at similar rates as with a shaped reward. These policies result in success rates comparable to the policy initialized on the simplified task. We could successfully transfer these policies to the real robot with only minor modifications of the depth image filtering. We found that using a heuristic to warm-start the training was useful to enforce desired behavior, while the policies trained from scratch using a curriculum learned better to cope with unseen scenarios where objects are removed.Comment: 8 pages, video available at https://youtu.be/ii16Zejmf-

    Incremental Object Database: Building 3D Models from Multiple Partial Observations

    Full text link
    Collecting 3D object datasets involves a large amount of manual work and is time consuming. Getting complete models of objects either requires a 3D scanner that covers all the surfaces of an object or one needs to rotate it to completely observe it. We present a system that incrementally builds a database of objects as a mobile agent traverses a scene. Our approach requires no prior knowledge of the shapes present in the scene. Object-like segments are extracted from a global segmentation map, which is built online using the input of segmented RGB-D images. These segments are stored in a database, matched among each other, and merged with other previously observed instances. This allows us to create and improve object models on the fly and to use these merged models to reconstruct also unobserved parts of the scene. The database contains each (potentially merged) object model only once, together with a set of poses where it was observed. We evaluate our pipeline with one public dataset, and on a newly created Google Tango dataset containing four indoor scenes with some of the objects appearing multiple times, both within and across scenes

    Online Incremental Object-Based Mapping for Mobile Manipulation

    No full text
    Robotic systems have shown impressive results at navigating in previously mapped areas, in particular in the domain of assisted (and autonomous) driving. As these systems do not perform physical interaction with the environment, the map representations are optimized for precise localization and not for rapidly changing scenes. Environment changes are only incorporated into these maps when observed repeatedly. On the other hand, when physical interaction between the robot and the environment is required, it is crucial that the map representation is at any time consistent with the world. For instance, the new location of a manipulated (or externally moved) object must be constantly updated in the map. In this thesis, we argue that object based maps are a more suitable map representation for this purpose. Our solutions build on the hypothesis that object based representations are able to deal with change and as they contain or gather knowledge about physical objects, they apprehend what parts of the environment can be jointly modified. This thesis aims to find such environment representations that are well suited for robotic mobile manipulation tasks. We start by creating a system that takes measurements from localized RGB-D cameras and integrates them into an instance based segmentation map. This is done by segmenting each incoming depth frame with a geometric approach into locally convex segments. These segments are integrated into a 3D voxel grid as a Truncated Signed DistanceField (TSDF) with an associated instance label. By updating these labels as new segments are integrated a consistent segmentation map is formed. Each segment is stored with its observed position in a 3D object model database, which represents the environment using object-like segments. But in addition to represent the environment, the database can be used to match and merge newly extracted map segments and complete the scene as repeating instances appear or if an instance has been observed in a previous session. To acquire such maps and to enable robots to interact with the environment, we show that it is beneficial to fuse information of multiple sensor modalities. For instance, cameras have shown to be a great source for creating sparse localization maps, whereas measurements from depth sensors can create dense reconstructions even in textureless regions of the environment. However, before using multiple sensors together, a challenging problem is to spatially and temporally align the sensor measurements. Hence, we focus on how to get robotic actuators and multiple sensors into a common place and time frame to allow the fusion of measurements and to let the robot act and interact in such a frame. We show how filtering and optimization techniques improve initial time-synchronizations and hand-eye calibrations. Next, we use the tools and techniques developed for the mapping and object discovery task in the context of manipulation. Using a set of rocks, we want to form vertical balancing towers with a robotic arm equipped with a wrist-mounted RGB-D camera. By identifying previously scanned rocks in a tabletop scene, we perform a set of simulation iterations using a physics engine with the detected objects to assess the stability of possible stack configurations. In a greedy manner, we select the next best rock to place and find and execute a grasping and placing motion. The segmentation map presented in this thesis allows to extract single geometric instances in a priori unknown environments. An incremental object database is built, which can match and merge re-observed or repeating object segments. These merged instances improve the raw extracted 3D models over time and, finally, the approach even allows to complete unobserved parts of the scene. Compelling results are exhibited in extracting and creating singulated object models from RGB-D images of household objects in cluttered warehouse distribution box scenes, furniture in indoor scenes, and cars in a parking garage. We show that our matching approach can be used to identify an object's pose in a scene accurately enough to solve delicate manipulation tasks. Together with a newly introduced greedy next best object target pose planning algorithm, we can stack stones to vertical balancing towers. We demonstrate that our new hand-eye calibration framework is applicable to many different robotic use cases. The integration of a time-alignment step takes away the burden of manually getting time-aligned pose sets, whereas filtering and optimization techniques improve calibration results in all evaluated datasets
    corecore